Systems and methods for identifying claims in electronic text

ABSTRACT

A system, method, and computer program for identifying claims associated with electronic text are provided. In an approach, electronic text is accessed. Linguistic content associated with the electronic text is identified. A linguistic structure is generated based on the linguistic content identified. The linguistic structure is compared to a claim template. A claim is identified based on the comparison.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application claims the benefit under 35 U.S.C. 120 as acontinuation of application Ser. No. 12/006,716, filed Jan. 4, 2008,which claims the benefit of provisional application 60/878,880, filedJan. 5, 2007 the entire contents of which are hereby incorporated byreference for all purposes as if fully set forth herein.

BACKGROUND

1. Field of the Invention

The present invention relates generally to natural language processing,and more particularly to systems and methods for identifying claims inelectronic text.

2. Background Art

Conventionally, natural language processing systems may be utilized toprocess electronic text. Natural language processing may identify, forexample, various file formats, character encoding schemes,parts-of-speech tagging, syntactic parsing, and so forth. Reasons forprocessing the electronic text range from storing and retrievinginformation to evaluating the electronic text to create and managetaxonomies.

Vast numbers of claims are made every year by numerous organizations,companies and individuals. For example, millions of product and serviceclaims are made every year by many thousands of companies marketingthrough various communications channels. Governments, agencies andpoliticians regularly communicate claims to the general public andvoters. Increasingly, marketing, advertising, communications andmessaging is conducted through electronic channels, includingtraditional channels such as television and radio, and emergingelectronic channels, such as the Internet, as well as cell phones andother handheld or wireless communications devices.

Claims of various kinds are of interest to millions of shoppers,marketing and purchasing professionals, public relations andcommunications professionals, business and political strategists,various organizations, the general public, and regulators, such as theFTC, FDA and SEC. For example, product and service claims may be used byshoppers to find suitable products and make buying decisions, bymarketers to assess competitive offerings and position productofferings, by purchasing agents to support purchasing decisions andcontracts, and by regulators to find and stop deceptive advertising andmarketing practices. Political claims may be used by politicalstrategists and candidates to characterize opponents and stake outpositions on issues. Other kinds of claims are useful to variousaudiences. Claims may be located and analyzed via manual review orsearch engines. Unfortunately, manual review and analysis of theelectronic text is typically time consuming and inconsistent, and searchengines often produce voluminous results.

SUMMARY OF THE INVENTION

A system, method, and computer program for identifying claims associatedwith electronic text are provided. Electronic text is accessed.Linguistic content associated with the electronic text is identified. Alinguistic structure is generated based on the linguistic contentidentified. The linguistic structure is compared to a claim template. Aclaim is identified based on the comparison.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram of an exemplary environment for identifying claims;

FIG. 2 is a diagram of an exemplary environment for utilizing identifiedclaims to support user applications;

FIG. 3 shows an exemplary claims engine;

FIG. 4 is a diagram illustrating an exemplary representation of anidentified claim;

FIG. 5 shows a screen shot illustrating exemplary claims identificationfrom electronic text;

FIG. 6 is a flow diagram of an exemplary process for identifying claimsassociated with electronic text; and

FIG. 7 shows an exemplary computing device.

DESCRIPTION OF EXEMPLARY EMBODIMENTS

Referring now to FIG. 1, an exemplary environment for identifying claimsis shown. One or more users 102, such as user A 102A, user B 102B,through user N 102N, may communicate with a claims engine 104 directlyor via a network 106.

The claims engine 104 accesses various electronic text via the network106 or otherwise, and ultimately identifies claims from the electronictext (sources of electronic text are discussed further in connectionwith FIG. 2). The claims may comprise any type of statement or messageintended to impart information to, or influence, recipients of thestatement or message. Such recipients may include potential buyers orother participants in a marketplace. For example, the statement “Gogijuice cures cancer” in electronic media may comprise a consumer-targetedproduct claim. The statement “John McCain supports immigration reform”in an advertisement may comprise a political claim targeted to voters.Many other kinds of claims exist.

Within product and service claims, many variants exist, including butnot limited to: performance claims, energy efficiency claims, financialclaims, and environmental or “green” claims, i.e. claims that a productor service is made and acts in a neutral or positive manner with respectto the earth and natural systems. Some claims have legal meanings andsignificance. For example, specific types of claims may comprise “healthbenefit claims”, i.e. claims to treat, cure, mitigate, prevent ordiagnose disease; “structure/function claims”, i.e. claims to maintainnormal function of the body; “safety claims”, i.e. claims that a productor service are safe to use; “efficacy claims”, claims that a product orservice are effective for their intended use; and “nutrient content andhealth claims”, i.e. claims that a food provides a health or nutritionalbenefit.

The network 106 may comprise the Internet, a local area network, a peerto peer network, and so forth. Alternatively, the claims engine 104 mayaccess the electronic text locally, such as from a storage mediumlocated on the same computer on which the claims engine 104 resides.

The users 102 may access the claims engine 104 directly or via thenetwork 106, as discussed herein. The claims engine 104 may communicatewith claims storage 108. The users 102 may also communicate with claimsstorage 108 directly or via a network 106. The claims storage 108 maycomprise more than one storage medium, claims databases, and so forth,according to exemplary embodiments. The users 102 may perform a searchof the claims storage 108 via the claims engine 104, such as by queryingthe claims engine 104 for specific claims. Alternatively, the users 102may perform a search of the claims storage 108 directly. The query maybe based on attributes associated with the claims, for example. Theclaims identified by the claims engine 104 may be analyzed and presentedto the users 102 in any manner. Analysis and presentation of the claimsis discussed further in connection with FIGS. 3 through 6.

According to some embodiments, data mining and rules-based algorithmsmay be employed. A plurality of external standards may be incorporatedas a reference for assessment or evaluation of the claims and associatedattribute values, discussed herein. Queries comprising a plurality ofattribute values, and a succession of such queries may be used to targetand narrow results. Results may be displayed on a plurality of displaydevices.

FIG. 2 is a block diagram of an exemplary environment for utilizingidentified claims to support user applications. One or more informationsources 202, such as 202 A through 202N, associated with the electronictext may be accessed directly or via the network 106, as discussed inFIG. 1. The information sources 202 may include but are not limited toWeb sites, RSS feeds, email, electronic text transcripts of televisionor radio, electronic text derived from audio recordings or voicecommunications via voice-to-text technology, electronic text derivedfrom documents through optical character recognition technology (OCR),electronic text derived from images, graphics, video or streaming video,and the like.

The electronic text may originate from various information sources 202and may derive from various kinds of documents or media, such as pressreleases, product labels, product information and specification sheets,marketing literature, advertisements, articles, blogs, movies, songs orother forms of unstructured text. Further, the electronic text may bereceived through a plurality of media channels, such as the Internet,television, radio, RSS feeds, newspapers, magazines, blogs, email, andso forth, or physically received on storage medium such as compact disc.The volume of electronic text analyzed may be small, as in a singledocument, or very large, such as the Internet. The analysis may be anisolated instance, or be continuous or recurring. Various and multiplenatural language and linguistic processing techniques, such asinformation extraction, may be utilized by the claims engine 104.Various and multiple key attributes associated with the claimsidentified by the claims engine 104 may also be extracted.

The claims engine 104 may identify claims from the electronic text, andmay store the claims in claims storage 108, in a database associatedwith the claims storage 108 discussed in FIG. 1, or any other type ofstorage. Further, any mechanism for organizing the informationassociated with a storage medium may be utilized. As discussed herein,more than one database or other storage mediums may be utilized to storethe linguistic content, the linguistic structures, and/or the identifiedclaims. For example, a distributed storage system may be utilized forstoring and accessing the identified claims and associated linguisticcontent and linguistic structures.

A user application 204 may communicate with the claims storage 108and/or with the claims engine 104. Various user applications 204 may bein communication with the claims storage 108. Each application mayutilize the claims stored in the claims storage 108. The userapplication 204 may include, but are not limited to: a shoppingapplication, which may utilize the claims to direct potential purchasersto suitable products or services, or push products and services totargeted potential buyers; an advertising application, which may utilizethe claims to direct an advertisement to targeted recipients; abuyer-seller matching application, which may utilize the claims to matchappropriate buyers and sellers in ecommerce or electronic auctions; acompliance application, which may utilize the claims to find andidentify noncompliant or illegal advertising or marketing practices; anda procurement application, which may utilize the claims to identifyproducts and services meeting procurement specifications. One or moreusers 102 may interact with the user application 204 in order to utilizethe claims for various purposes discussed herein, such as to locateproducts, perform analysis, and so forth. In alternative embodiments,the users may access the user application 204 via the claims engine 104.

FIG. 3 shows a block diagram of an exemplary claims engine, such as theclaims engine 104 discussed in FIG. 1. A communications interface 302 isprovided for facilitating communication between the users 102 and theclaims engine 104.

The claims engine 104 may also include a monitoring module 304. Themonitoring module 304 accesses the electronic text, either via thenetwork 106 or locally, such as by accessing the electronic text fromthe same device that comprises the monitoring module 304. The monitoringmodule 304 may search the network 106, storage, databases, theinformation sources 202 discussed in FIG. 2, and so forth in order toaccess the electronic text. The monitoring module 304 may include ascheduler for specifying times, increments of times, and so forth forthe monitoring of the electronic text to be performed. Accordingly,claims in the electronic text may be identified during the monitoring orafter the monitoring.

According to some embodiments, the monitoring schedule may be differentfor different sources of electronic text. For example, websites thattend to include a large number of claims may be monitored morefrequently. Conversely, sources of claims determined to be consistentwith regulatory standards associated with claims may be monitored lessfrequently. The monitoring module 304 may utilize various informationdiscovery tools including searching, sorting, organizing, and browsingby full text or fields (attributes).

A linguistic structure module 306 identifies linguistic contentassociated with the electronic text. For example, identifying linguisticcontent may comprise identifying one or more linguistic features in thetext and identifying one or more relationships between linguisticfeatures. Identifying linguistic features may comprise, but is notlimited to: tokenizing electronic text into segments of text, forexample, identifying words, punctuation, white space, capital and lowercase letters, sentences and so forth; identifying words for their partof speech, such as nouns, verbs, adjective, adverbs, and so forth;identifying groups of words, such as noun phrases, verb phrases, andprepositional phrases, and identifying entities, such as people, places,and things, such as in a health-related domain, identifying specificdiseases. Other characteristics of the electronic text may be identifiedas linguistic features, and thus may be part of the linguistic content.Identifying relationships between linguistic features may comprise, butis not limited to: identifying subjects, verbs and objects of sentencesand therefore identifying subject-verb-object relationships; identifyingsemantic roles of the sentence parts and therefore semanticrelationships between the sentence parts. Other types of relationshipsidentified may include, but are not limited to, grammatical,co-referring, semantic, and discourse relationships. A linguisticstructure is generated based on the linguistic content identified by thelinguistic structure module 306. A linguistic structure is anycombination of two or more linguistic features, or a combination of atleast one feature and one relationship. Linguistic structures may bestored in a linguistic database in the linguistic structure module, inthe claims storage 108, or in any other storage. Linguistic structuresmay be indexed in the claims storage 108 or any other storage to enablerapid and efficient retrieval.

The linguistic structure module 306 may also comprise informationextraction agents that extract other information associated with theclaim from the electronic text. Such other information may include, butis not limited to people, places, things or times. For example, acompany name or a person's name may be extracted, even though a companyname or a person's name does not constitute a claim. Other informationmay be stored with the claims.

A template module 308 includes one or more claim templates, such asstructured claim templates or statistical claim templates. Statisticalclaim templates may comprise statistical models associated with claimsand/or text examples. For example, in looking for “health benefitclaims” of interest to the FDA, an example structured claim template for“health benefit claim” may be defined as a set of criteriacontaining: 1. presence of a “change verb”, defined as any verbappearing in WordNet that is a hyponym of the verb “change” (an externalreference), and 2. presence of a “disease term” defined as any termappearing as a node in the Diseases branch in the Medical SubjectHeadings tree (MeSH, also an external reference), and 3. presence of apredicate-object relationship between the “change verb” and the “diseaseterm” in which the “disease term” is the syntactic object of the “changeverb.” Statistical claim templates may include statistical models oflinguistic features associated with claims and/or text examples. Variouslinguistic forms exist for claims. The template module 308 may generateor access a template for each form. Templates may be stored in a querydatabase contained in the template module, in the claims storage 108, orin any other storage. The templates may be utilized to identify claimsfrom the electronic text based on the linguistic content.

The claims engine 104 may include an analysis module 310. The analysismodule 310 may analyze the claims identified by the claims engine 104.The analysis may be utilized to recognize trends, patterns, outliers,correlations, relevance, similarity, and other attributes, such aswhether a particular claim is likely to be false, misleading, orillegal. The analysis module may utilize external standards orinformation to evaluate claims, such as FDA-approved labels and productinformation sheets to identify an “off-label” claim.

The analysis module 310 may include data mining agents for automaticallyanalyzing claims. The data mining agents may consist of a series ofsoftware algorithms that query the claims storage 108, retrieve lists orrecords, perform some analysis (such as statistical, pattern matching,further NLP, etc.) Accordingly, the monitoring module 304 and theanalysis module 310 may be utilized to perform a partially or completelyautomated claims analysis system, such as by performing an automatedsearch and identification of the electronic text and the identificationand analysis of the claims identified associated with the electronictext. The analysis module 310 may utilize the data mining agents togenerate reports for further analysis, or the analysis module 310 mayaugment records of the claims by storing information to the claimsstorage 108.

A claims presentation module 312 may also be provided for modifying theclaims. The claims may be modified in accordance with regulatorystandards, such as regulations set forth by the Food and DrugAdministration (“FDA”) or Federal Elections Commission. Alternatively,the claims presentation module 312 may suggest modifications to theclaims detected in the electronic media, for example, to a provider ofthe electronic media, such as a website sponsor that displays theelectronic media including the claims. The monitoring module 304 maymonitor the electronic media specifically for claims that should havebeen modified based on suggestions, for example, from a government bodyor other authority.

A claims presentation module 314 may display the claims to users, suchas analysts, shoppers, procurement professionals, regulators,strategists, communications specialists, researchers, and so forth. Thedisplay may be in response to a query for types of claims, for example.Any type of presentation may be provided by the claims presentationmodule 314. According to some embodiments, claim alerts based on userspecified criteria can be setup and accessed via a web interface orpushed via email message. For example, a campaign manager may be alertedto the dissemination of a political claim made by an opponent.

The results to a query may be displayed in a variety of ways. Forexample, a results set may be presented as a list of claims meeting thequery. For each claim, a summary of each of the claims and theassociated information may be generated. There may be links from thesummary to a cached copy of the text, in which claims are highlightedfor easier examination. Highlighted claims are shown in FIG. 5,discussed herein. There may also be links from the summary to originaltext.

Although various modules are shown in FIG. 3, fewer or more modules maycomprise the claims engine 104 and still fall within the scope ofvarious embodiments. For example, a claims indexer (not shown) may alsocomprise the claims engine 104 according to some embodiments.

FIG. 4 is a chart illustrating an exemplary representation of anidentified claim. An example claim template is a structured claimtemplate for an ability claim. Claim templates may be of other types,such as a statistical claim template, and may be for claims of adifferent type, such as an identity claim.

Example text 402 comprises the electronic text discussed herein.Although the electronic text may include more than the example text 402shown in FIG. 4, the example text 402 represents language that may beidentified as potential claim language by the claims engine 104.

The example text 402 is utilized to generate example linguisticstructures 404. The example linguistic structures 404 may includelinguistic features such as parts of speech and identified entities suchas ingredients, products and diseases, and may include relationshipsbetween linguistic features such as subject-predicate-object relations,and so forth.

The example linguistic structures 404 are then compared to an exampleclaim template, such as a structured claim template for an ability claim406, to identify the claims. As discussed herein, the linguisticstructures 404 may alternatively be statistically analyzed for a matchwith a statistical claim template based on a training set of textexamples. Based on the comparison, the claim may be identified. Forexample, if the linguistic structures 404 match the ability claimtemplate 406 in FIG. 4, the example text 402, or a portion thereof, maybe identified as a claim. According to some embodiments, claim languagemay then be extracted from the electronic text and may be stored. Thelinguistic structures 404 and/or the claim templates may also be stored.

In FIG. 4, each example text 402 is comprised of a sentence with asubject, predicate and object. The resulting example linguisticstructures 404 generated thus each contain a subject, predicate andobject. Each example of the linguistic structures 404 is then comparedto the example claim template 406. The example claim template 406contains a subject slot, a predicate slot and an object slot. The slotsare related in a subject-predicate-object relationship. The claimtemplate 406 further defines that the subject slot contain a product oringredient, that the predicate slot contain a “change verb”, i.e. a termthat appears in WordNet (an external reference), as a hyponym for theverb “change”, and that the object slot contain a “disease term” definedas a node in the Disease branch of the Medical Subject Headings tree(MeSH, a second external reference). In FIG. 4, the example text 402including the text “Doctors treat cancer with surgery” is not identifiedas a claim because while a subject-predicate-object relationship ispresent, and the specific predicate and object meet the criteria setforth in the example claim template 406, the subject is identified as aperson, not a product or ingredient as required by the template.

The text example containing the text “Cancer can also be treated bydrinking Goji Juice” is identified as a claim because the text has asubject, predicate and object, all in a subject-predicate-objectrelationship, and the subject contains a term identified as aningredient, the predicate contains a term that is a “change verb”, i.e.a hyponym of the verb “change” in WordNet, and the object is a “diseaseterm”, i.e. a word found as a node in the MeSH Disease branch. A claimmay therefore be extracted from the electronic text and may berepresented, as in FIG. 4. The text example containing the text “LifeElixir can actually shrink tumor cells” is also identified as a claim,and may be extracted, because it too meets all criteria of the exampleclaim template 406. Other linguistic templates may have more or lesscriteria and may have different criteria, based upon various anddifferent linguistic content, and may be of various types, such as an“identity claim template”. The linguistic structures 404 may or may notmatch other claim templates according to greater or fewer criteria,different criteria, and different template types.

As discussed herein, various other claim templates may be utilized toidentify claims, such as identity, attribute, attribution, andsuperiority claim templates. Examples of other templates include anidentity claim template that may take the form “Gogi Juice is acancer-killer,” an attribute claim template that may take the form “GogiJuice has cancer-killing properties,”an attribution claim template thatmay take the form “4 out of 5 doctors recommend Gogi Juice for cancer”,and a superiority claim template that may take the form “Goji Juice isbetter than chemotherapy for treating cancer.” As discussed herein, anyclaim templates, such as structured templates or statistical templatesmay be employed for comparison with the linguistic structures 404 andidentification of the claims.

As discussed herein, the claims identified based on the comparisonbetween linguistic structures and claim templates may occur due to astatistical probability instead of a match. For example, a claim may beidentified because the criteria set forth in a statistical claimtemplate has a 75% chance of being satisfied by one or more of thelinguistic structures.

As discussed herein, the claims identified based on the comparisonbetween the linguistic structures and the claim templates may be storedin the claims storage 108 for later analysis or access. According tosome embodiments, any user may access the claims for evidence, support,research, or for any other reason.

FIG. 5 shows a screen shot illustrating exemplary claims identificationfrom electronic text. A portion of electronic text 502, representativeof a larger body of electronic media associated with “Herbal Promise,LLC” is shown in FIG. 5. The claims engine 104 identifiesdisease-related claims 508 from the electronic media, based oncomparison of generated linguistic structures to a claim template, suchas a structured claim template or a statistical claim template, asdiscussed herein in connection with FIG. 4. Strings of suspicious text504 containing claims are also displayed. Additional informationassociated with the identified claims may also be displayed. Forexample, supplements and other ingredients 506 may be extracted anddisplayed. The company offering the product or service, and its contactinformation, may also be extracted and displayed. According to someembodiments, the source of the text, for example a URL (if a website isthe source) may be displayed. This additional information may also bestored in the claims storage 108.

The claims identified along with other related information may be storedin the claims storage 108, which may then be searched, as shown in FIG.5. The search may be based on a date range 510, by checking a diseaseclaim search boxes 512, and/or disease types boxes 514. Any other typeof search parameters may be provided in addition to or instead of thesearch parameters (e.g., the date range 510, the disease claims searchbox 512, and the disease types boxes 514) shown in FIG. 5.

FIG. 6 shows a flow diagram of an exemplary process for identifyingclaims associated with electronic text. At step 602, electronic text isaccessed. The access may occur by various means, such as monitoring theelectronic text via a network, such as a wide area network, a local areanetwork, a peer to peer network, and so forth. Alternatively, theelectronic text may be accessed directly on a client device or directlyfrom the information sources 202 discussed in FIG. 2. As discussed inFIG. 3, the monitoring module 304 may be provided for schedulingmonitoring according to user specified time increments or any otherschedule. As also discussed herein, any information sources 202 may beutilized to derive the electronic text.

At step 604, linguistic content associated with the electronic text isidentified. For example, linguistic features may be identified by:breaking down or tokenizing the electronic text into words, sentences,punctuation, white spaces, and so forth; identifying words for theirpart of speech, such as nouns, verbs, adjectives, adverbs, and so forth;identifying word groups constituting phrases, identifying entities, suchas places, people, companies, times, products, and so forth.

Using semantic role labeling, or other techniques, the words, phrases,and/or entities may be labeled as sentence parts (e.g., subject, object,predicate), thereby identifying linguistic roles that the words,phrases, and/or entities play and the linguistic relationships betweenthem, thus further identifying linguistic content. For example, theherbal medicine electronic media 502 shown in FIG. 5 may be broken down,parts of speech identified, phrases identified, entities identified,such as product type, disease type, ingredients, and so forth, andlinguistic roles and relationships of the words, phrases, and otherparts identified, such as subject-predicate relations and verb-objectrelations.

According to some embodiments, natural language processing is utilizedto process the electronic text, such as by converting file formats andcharacter encoding schemes, part-of-speech tagging, syntactic parsing,information extraction, automated text categorization, word sensedisambiguation, text segmentation, relationship mining, event detection,toponym resolution, and creation and management of taxonomies, lexicons,and knowledge bases.

At step 606, a linguistic structure based on the linguistic contentidentified is generated, such as the exemplary linguistic structure 404shown in FIG. 4. The linguistic structures may be comprised oflinguistic features such as words, phrases and text strings (includingentities) identified as parts of speech, together with relationships ofthe features, such as subject-verb-object relationships. In someembodiments, relationships are implied by identification of linguisticroles. All linguistic features and relationships may constitutelinguistic content. The linguistic structures may also be utilized torepresent a collection of concepts that are expressed in the electronictext, for example.

At step 608, the linguistic structures are compared with a claimtemplate. The claim template may comprise a structured claim template,such as the ability claim template discussed in FIG. 4. For example, thetemplate module 308 may utilize the template for “health benefit claim”discussed in connection with FIG. 3. As discussed herein, statisticaltemplates may alternatively be utilized as claim templates.

At step 610, a claim is identified based on the comparison. For example,the electronic text giving rise to linguistic structures that match thepredetermined claim template, such as the structured claim template orthe statistical claim template discussed herein, may be deemed claims.The electronic text giving rise to linguistic structures that meetpre-set threshold probability criteria set forth by a statisticaltemplate may also be identified as one or more claims.

As discussed herein, the claims may also be stored for future access.The claims may be stored in the claims storage 108 discussed in FIG. 1,in a database, or on any storage medium. Storing and retrievinginformation may also be accomplished via full text search engines anddatabase management systems, according to some embodiments. According toexemplary embodiments, the claims may be accessed via the claims storage108 and automatically analyzed and presented to a user. For example,data mining agents may run analysis whenever new data is added or on ascheduled basis. For example, the user may recognize noncompliantproduct information, deceptive ads or other illegal practices, and soforth using the automatic analysis of the claims. Identified claims maybe analyzed for a wide variety of other purposes, such as in shoppingapplications.

After the claims and/or other associated information, such as theadditional information 506 shown in FIG. 5, is identified, data miningand rules-based algorithms may be applied to analyze patterns, trends,outliers, correlations, relevance, similarity, comparison to standards,and other characteristics. Such additional analysis may also bedisplayed along with the relevant claim(s). For example, claimscategorized as health benefit claims may be analyzed for theirlikelihood to be potentially violative (illegal) under Food and Druglaw. Such potentially violative claims may be presented as a ranked listaccording to likelihood of violative content.

The identified claims, as well as other information associated with theclaims, may be presented using natural language generation tools, textsummarization, or information visualization systems, for example. Asdiscussed herein, any type of presentation of the claims identified bythe claims engine 104 may be utilized.

According to some embodiments, the claims identified by the claimsengine 104 may be automatically “red-flagged.” For example, instances inwhich claims have a high likelihood of violating the law may bered-flagged, e.g. supplement claims about diagnosis, cure, mitigation,treatment and prevention of disease. Further, the monitoring module 304may automatically conduct “follow-up” monitoring to verify thatoffending companies associated with the detected claims have compliedwith FDA requests or orders for corrective action, such as modificationsto the claims, as discussed herein.

FIG. 7 shows an exemplary computing device. The computing device maycomprise the claims engine 104 or computing devices associated with theusers 102 according to some embodiments. The computing device includes acommunications interface 702, a processor 704, a memory 706, and storage708, which are all coupled to the bus 710. The bus 710 providescommunications between the communications interface 702, the processor704, the memory 706, and the storage 708.

The processor 704 executes instructions. The memory 706 permanently ortemporarily stores data. Some examples of the memory 706 are RAM andROM. The storage 708 also permanently or temporarily stores data. Someexamples of the storage 708 are hard disks and disk drives.

The embodiments discussed herein are illustrative. As these embodimentsare described with reference to illustrations, various modifications oradaptations of the methods and/or specific structures described maybecome apparent to those skilled in the art.

The above-described components and functions can be comprised ofinstructions that are stored on a computer-readable storage medium. Theinstructions can be retrieved and executed by a processor. Some examplesof instructions are software, program code, and firmware. Some examplesof storage medium are memory devices, tape, disks, integrated circuits,and servers. The instructions are operational when executed by theprocessor to direct the processor to operate in accord with theinvention. Those skilled in the art are familiar with instructions,processor(s), and storage medium.

While various embodiments have been described above, it should beunderstood that they have been presented by way of example only, and notlimitation. For example, any of the elements associated with the claimsengine 104 may employ any of the desired functionality set forthhereinabove. Thus, the breadth and scope of a preferred embodimentshould not be limited by any of the above-described exemplaryembodiments.

What is claimed is:
 1. A data processing method comprising: receiving acorpus of documents containing one or more product claim annotations;identifying, based on the one or more product claim annotations in thecorpus of documents, one or more template linguistic structures likelyto indicate presence of a product claim; accessing electronic text;identifying linguistic content associated with the electronic text,wherein the linguistic content includes a plurality of linguisticfeatures; generating a linguistic structure based on the linguisticcontent identified, wherein the linguistic structure identifies at leasta relationship between the plurality of linguistic features; identifyinga particular product claim within the electronic text based on comparingthe linguistic structure to the one or more template linguisticstructures; wherein the method is performed by one or more computingdevices.
 2. The method of claim 1, wherein identifying the one or moretemplate linguistic structures comprises using one or more machinelearning techniques including: maximum entropy, support vector machine,neural network, nearest neighbor, hidden Markov model, conditionalrandom fields, or maximum entropy Markov model.
 3. The method of claim1, wherein the one or more product claim annotations identify at leastone or more boundaries of a claim within the corpus of documents and atype categorization for the claim.
 4. The method of claim 1, wherein theone or more template linguistic structures specify one or more featuresincluding: lexical entities, grammatical relations, semantic meanings,or argumentative structure.
 5. The method of claim 1, whereinidentifying the claim within the electronic text involves tagging theclaim within the electronic text with one or more annotationsidentifying the linguistic structure.
 6. The method of claim 5, whereinthe one or more product claim annotations specify features including:product name, product type, product benefit, object of the productbenefit, or product user category.
 7. The method of claim 1, furthercomprising: determining whether the particular product claim ismisleading based on one or more factors including: absence of riskinformation, absence of required supporting references, non- compliancewith a set of rules, presence of false benefits, incorrect side effectinformation, omissions of material facts, obfuscated risk disclosure,amount or type of evidence presented for the particular product claim,presence of anecdotal evidence, references to government agencyevaluations, or references to historical or traditional use of aproduct; in response to determining that the particular product claim ismisleading, storing data that flags the particular product claim asmisleading.
 8. The method of claim 1, further comprising: generating,based on the particular product claim, one or more productrecommendations for a user.
 9. The method of claim 8, wherein the one ormore product recommendations include one or more of: a list of suitableproducts or services for the user, an advertisement to the useridentifying one or more products, or data specifying one or more sellerswhich sell the one or more products.
 10. The method of claim 8, whereingenerating the one or more product recommendations is based on inputsupplied by the user.
 11. The method of claim 1, further comprising:generating, based on the particular product claim, one or more of: aproduct description, a product review, or a comparison between one ormore different products.
 12. A computer-readable storage medium storingone or more instructions which, when executed by one or more processors,cause the one or more processors to perform: receiving a corpus ofdocuments containing one or more product claim annotations; identifying,based on the one or more product claim annotations in the corpus ofdocuments, one or more template linguistic structures likely to indicatepresence of a product claim; accessing electronic text; identifyinglinguistic content associated with the electronic text, wherein thelinguistic content includes a plurality of linguistic features;generating a linguistic structure based on the linguistic contentidentified, wherein the linguistic structure identifies at least arelationship between the plurality of linguistic features; identifying aparticular product claim within the electronic text based on comparingthe linguistic structure to the one or more template linguisticstructures.
 13. The computer-readable storage medium of claim 12,wherein the instructions for identifying the one or more templatelinguistic structures comprise instructions which when executed by theone or more processors cause using one or more machine learningtechniques including: maximum entropy, support vector machine, neuralnetwork, nearest neighbor, hidden Markov model, conditional randomfields, or maximum entropy Markov model
 14. The computer-readablestorage medium of claim 12, wherein the one or more product claimannotations identify at least one or more boundaries of a claim withinthe corpus of documents and a type categorization for the claim.
 15. Thecomputer-readable storage medium of claim 12, wherein the one or moretemplate linguistic structures specify one or more features including:lexical entities, grammatical relations, semantic meanings, orargumentative structure.
 16. The computer-readable storage medium ofclaim 12, wherein the instructions which when executed cause identifyingthe claim within the electronic text comprise instructions which whenexecuted by the one or more processors cause tagging the claim withinthe electronic text with one or more annotations identifying thelinguistic structure.
 17. The computer-readable storage medium of claim16, the one or more product claim annotations specify featuresincluding: product name, product type, product benefit, object of theproduct benefit, or product user category.
 18. The computer-readablestorage medium of claim 12, comprising instructions which when executedby the one or more processors cause performing: determining whether theparticular product claim is misleading based on one or more factorsincluding: absence of risk information, absence of required supportingreferences, non- compliance with a set of rules, presence of falsebenefits, incorrect side effect information, omissions of materialfacts, obfuscated risk disclosure, amount or type of evidence presentedfor the particular product claim, presence of anecdotal evidence,references to government agency evaluations, or references to historicalor traditional use of a product; in response to determining that theparticular product claim is misleading, storing data that flags theparticular product claim as misleading.
 19. The computer-readablestorage medium of claim 12, comprising instructions which when executedby the one or more processors cause performing generating, based on theparticular product claim, one or more product recommendations for auser.
 20. The computer-readable storage medium of claim 19, wherein theone or more product recommendations include one or more of: a list ofsuitable products or services for the user, an advertisement to the useridentifying one or more products, or data specifying one or more sellerswhich sell the one or more products.
 21. The computer-readable storagemedium of claim 19, comprising instructions which when executed by theone or more processors cause generating the one or more productrecommendations is based on input supplied by the user.
 22. Thecomputer-readable medium of claim 12, comprising instructions which whenexecuted by the one or more processors cause generating, based on theparticular product claim, one or more of: a product description, aproduct review, or a comparison between one or more different products.23. A data processing method comprising: accessing electronic text;identifying linguistic content associated with the electronic text,wherein the linguistic content includes a plurality of linguisticfeatures; generating a linguistic structure based on the linguisticcontent identified, wherein the linguistic structure identifies at leasta relationship between the plurality of linguistic features; identifyinga particular product claim within the electronic text based on comparingthe linguistic structure to a claim template; generating, based on theparticular product claim, one or more product recommendations for auser; wherein the method is performed by one or more computing devices.24. The method of claim 23, wherein the one or more productrecommendations include one or more of: a list of suitable products orservices for the user, an advertisement to the user identifying one ormore products, data specifying one or more sellers which sell the one ormore products, a product description, a product review, or a comparisonbetween one or more different products.
 25. The method of claim 23,wherein generating the one or more product recommendations is based oninput supplied by the user.
 26. A computer-readable storage mediumstoring one or more instructions which, when executed by one or moreprocessors, cause the one or more processors to perform: accessingelectronic text; identifying linguistic content associated with theelectronic text, wherein the linguistic content includes a plurality oflinguistic features; generating a linguistic structure based on thelinguistic content identified, wherein the linguistic structureidentifies at least a relationship between the plurality of linguisticfeatures; identifying a particular product claim within the electronictext based on comparing the linguistic structure to a claim template;generating, based on the particular product claim, one or more productrecommendations for a user; wherein the method is performed by one ormore computing devices.
 27. The computer-readable medium of claim 26,wherein the one or more product recommendations include one or more of:a list of suitable products or services for the user, an advertisementto the user identifying one or more products, data specifying one ormore sellers which sell the one or more products, a product description,a product review, or a comparison between one or more differentproducts.
 28. The computer-readable medium of claim 26, whereingenerating the one or more product recommendations is based on inputsupplied by the user.